feat(security): SSO/JWT authentication migration (Phase 1)#1569
feat(security): SSO/JWT authentication migration (Phase 1)#1569jsell-rh wants to merge 62 commits into
Conversation
Define desired state for migrating from OpenShift OAuth proxy to direct SSO/JWT authentication. Key decisions: - BFF pattern: Next.js as OIDC confidential client, browser gets session cookie - K8s impersonation: backend SA + Impersonate-User/Group preserves RBAC - Dual-path auth: JWT first, TokenReview fallback for API keys - Feature-flagged migration for incremental rollout - Supersedes ADR-0002 (raw token passthrough → impersonation) Includes migration workflow with consumer impact map and implementation notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for cheerful-kitten-f556a0 canceled.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reference the IAM consolidation proposal (PR #1466) as the long-term direction. This spec is Phase 1; future phases cover API keys → SSO service accounts, runner → OIDC token exchange, DB RBAC reconciler, and credential consolidation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… requirements - OIDC callback must coexist with existing integration auth routes - SSO client configuration requirements (one per environment, audience isolation) - Post-logout redirect URI and web origins specified Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kind/local-dev environments include Keycloak with pre-configured realm - Replaces static JWKS ConfigMap, DISABLE_AUTH mock mode, and OC_TOKEN - Same JWT validation code path as production (no dev-only auth logic) - Realm config version-controlled as JSON export - E2E tests use local Keycloak in Kind environments - Design decision: Keycloak Identity Brokering for deployed environments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Keycloak Deployment, realm JSON, and env var config for Kind overlay - Maps what it replaces (static JWKS, DISABLE_AUTH, test-user SA) - Identity Brokering section for deployed environments - Updated manifest changes to include Kind overlay additions/removals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d impersonation RBAC Slice 1 of SSO authentication migration (Phase 1): - Deploy Keycloak to Kind cluster with pre-configured realm (ambient-code) including confidential frontend client, public CLI client, and E2E client_credentials client. Dev users: developer/developer, admin/admin. - Add jwtauth package with JWKS-based JWT validation using lestrrat-go/jwx/v2. Validates signature, expiration, issuer, and audience. Extracts OIDC claims (sub, email, preferred_username, groups). - Add impersonate verb on users, groups, and serviceaccounts to backend-api ClusterRole for K8s impersonation under SSO auth. - Fix Kind overlay: relax runAsNonRoot for ambient-api-server, make control-plane OIDC env vars optional. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mpersonation Slice 2 of SSO authentication migration (Phase 1): - Wire JWT validation into backend middleware: forwardedIdentityMiddleware validates JWT against Keycloak JWKS, extracts identity from OIDC claims (sub, email, preferred_username, groups), and stores validated claims in Gin context for reuse by handlers. - Add dual-path auth in getK8sClientsDefault: JWT validation first, then TokenReview fallback for API keys (K8s ServiceAccount tokens). - Use K8s impersonation (Impersonate-User/Group) instead of raw bearer token when SSO is enabled. Backend SA token + impersonation preserves all existing RBAC enforcement. - Fix SSAR cache key to include impersonated identity instead of shared SA token, preventing cross-user authorization cache leaks. - Gate SSO path behind "sso-authentication" Unleash feature flag. - Add SSO env vars (SSO_ISSUER_URL, SSO_AUDIENCE) to backend Kind overlay. - Fix Keycloak realm: add audience mapper and protocol mappers for sub, email, preferred_username claims in access token. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slice 3 of SSO authentication migration (Phase 1):
- Add openid-client v6, iron-session v8, and jose as dependencies
- Create OIDC client layer (src/lib/oidc.ts): discovery, authorization URL
construction with PKCE, code exchange, token refresh, end-session URL
- Create encrypted session cookie management (src/lib/session.ts):
iron-session with httpOnly/secure/sameSite cookies, transparent token
refresh when access token is within 60s of expiry
- Add SSO API routes:
- /api/auth/sso/login: generates PKCE, stores verifier/state in cookies,
redirects to Keycloak authorization endpoint
- /api/auth/sso/callback: exchanges code for tokens, stores in session
- /api/auth/sso/logout: destroys session, redirects to Keycloak logout
- Add Next.js middleware: redirects unauthenticated page requests to SSO
login when SSO_ENABLED=true
- Modify buildForwardHeadersAsync: SSO path extracts JWT from session,
sets Authorization: Bearer and X-Forwarded-* headers from JWT claims.
All 97+ consumers are unaffected.
- Update navigation logout to use SSO logout route when enabled
- Update /api/me to accept Authorization header for auth check
- Add SSO env vars to Kind frontend deployment patch
- Support SSO_PUBLIC_ISSUER_URL for Kind dev (browser vs cluster URLs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keycloak supports any port for localhost redirect URIs per RFC 8252 section 7.3. Registering http://localhost/* (without port) accepts callbacks on any ephemeral port, eliminating port-forward mismatches. Also set webOrigins to "+" (all valid redirect origins) for CORS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openid-client v6 requires a standard URL instance (not NextURL). Construct callback URL from SSO_REDIRECT_URI base to match the redirect_uri sent during authorization, since request.url inside the container resolves to 0.0.0.0:3000 rather than localhost:11646. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In Kind, Keycloak's iss response parameter uses the public URL (localhost:30090) while openid-client validates against the internal URL (keycloak-service:8080). Remap the iss param before passing to authorizationCodeGrant so RFC 9207 issuer validation passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set KC_HOSTNAME to the internal service URL so Keycloak uses a consistent issuer in all tokens and OIDC responses, regardless of whether the browser reaches it via localhost:30090 or the server reaches it via keycloak-service:8080. This eliminates issuer mismatches in ID token validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In Kind, the browser reaches Keycloak via localhost:30090 while backend and frontend servers use keycloak-service:8080. Keycloak sets the token issuer based on the authorization session URL, causing mismatches. Fixes: - Add alt issuer support to JWT validator (AddAltIssuer) so the backend accepts tokens from both internal and public Keycloak URLs. Production environments use a single URL and don't need alt issuers. - Use standard openid-client authorizationCodeGrant in production (full ID token validation). Fall back to manual token exchange in dev when SSO_PUBLIC_ISSUER_URL differs from SSO_ISSUER_URL. - Set cookies directly on redirect response in login route (cookies() API mutations don't transfer to NextResponse.redirect). - Derive post-login redirect origin from SSO_REDIRECT_URI to avoid container-internal 0.0.0.0:3000 address. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ints The OIDC discovery config was cached as a module-level singleton with no expiry. If Keycloak restarted and got a new ClusterIP, token refresh calls would fail silently (ECONNREFUSED) and the session would be destroyed, logging the user out. Add a 5-minute TTL so the config is re-discovered periodically. This matches the Keycloak JWKS cache interval and ensures endpoint URLs stay current after dependency restarts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getUserSubjectFromContext now prefers userEmail (matching the Impersonate-User header) when creating RoleBindings. Previously it used userName (preferred_username), causing a mismatch: the RoleBinding subject would be "developer" but impersonation would use "developer@local.dev", so RBAC checks would fail. This ensures lazy RoleBinding creation in CreateProject works correctly with SSO impersonation — no manual RoleBindings needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Callback route: redirect to /api/auth/sso/login instead of showing JSON error when OIDC state cookies are missing or exchange fails. Handles stale Keycloak sessions that skip the login page. - Logout route: derive post-logout redirect URI from SSO_REDIRECT_URI to avoid 0.0.0.0:3000 container address. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NEXT_PUBLIC_* env vars are inlined at build time in Next.js client components, so they're unavailable when the image is built without them. Instead, expose ssoEnabled from the /api/me server route and read it in the navigation component via useCurrentUser(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The project layout had its own handleLogout hardcoded to /oauth/sign_out, separate from the main navigation. Unified both to use the runtime ssoEnabled flag from useCurrentUser(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slice 4 of SSO authentication migration (Phase 1): - Update extract-token.sh to obtain JWT from Keycloak via client_credentials grant (ambient-e2e client). Falls back to K8s SA token when Keycloak is not available. - Add audience and sub protocol mappers to ambient-e2e Keycloak client so tokens have proper aud claim for backend validation. - Add ClusterRoleBinding for e2e service account identity (service-account-ambient-e2e) so E2E tests can access projects. - No developer RoleBindings — JIT provisioning via CreateProject handles first-time access correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the OIDC session expires and token refresh fails, the user now sees a blocking dialog instead of silent 401 errors: - Global 401 detection via QueryCache/MutationCache onError handlers - Skip retries on 401 to prevent request storms against the IdP - Non-dismissable AlertDialog with "Log in" button that preserves returnTo path so users land back on the same page - No "expiring soon" warning — server-side refresh handles access token renewal transparently; only surfaces when refresh token dies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add logging to getAccessToken so token refresh attempts and failures are visible in pod logs (was silently swallowing errors). - Fix middleware to return 401 JSON for RSC/fetch requests instead of redirecting to Keycloak. Cross-origin redirects fail as XHR and cause CORS errors. Full page navigations still redirect to login. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openid-client's refreshTokenGrant validates the ID token iss claim in the refresh response, which fails when the token was issued by localhost:30090 but the refresh goes through keycloak-service:8080. Use manual fetch to the token endpoint in split-URL mode (same approach as code exchange). Production uses the library's standard refreshTokenGrant with full validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split-URL problem (browser→localhost:30090, server→keycloak-service:8080) caused token issuer mismatches that broke refresh tokens and ID token validation. Every workaround added complexity. Root fix: proxy Keycloak through the frontend at /sso/* so browser and server both reach Keycloak through the same origin. Combined with KC_HOSTNAME=http://keycloak-service:8080, all tokens now have a consistent issuer that matches the discovery endpoint. Changes: - Add /sso/[...path] catch-all route that proxies to Keycloak, rewriting Location headers on redirects - Set KC_HOSTNAME to internal service URL for consistent token issuer - Update SSO_PUBLIC_ISSUER_URL to use the proxy path - Exclude /sso from auth middleware matcher - Remove unused next.config.js rewrites (build-time, not runtime) This eliminates: alt issuers on the backend, manual token exchange fallbacks, iss parameter remapping in callbacks, and CORS errors on session expiry redirects. Production deployments use a single URL and don't need the proxy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates the split-URL issuer mismatch by properly configuring Keycloak's hostname-backchannel-dynamic feature: - KC_HOSTNAME=http://localhost:11646/sso — all tokens use the public URL as issuer, login pages render with proxy URLs - KC_HOSTNAME_BACKCHANNEL_DYNAMIC=true — internal services get backchannel URLs (token_endpoint, jwks_uri) via keycloak-service:8080 Frontend changes: - Manual OIDC discovery to bypass openid-client v6's issuer validation (known issue: github.com/panva/openid-client/issues/737) - Remove all split-URL workarounds (manual token exchange, iss remapping, URL rewriting in auth/logout/refresh) - openid-client's standard authorizationCodeGrant and refreshTokenGrant now work correctly for all flows Backend changes: - JWT validator uses discovered issuer from OIDC metadata (not the discovery URL) so it accepts the public issuer in tokens Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Frontend README: replace OC_TOKEN/OAuth proxy header docs with SSO env vars and OIDC session model description - Frontend .env.example: add SSO_* vars, move OC_* to legacy section - Backend README: replace DISABLE_AUTH migration guide with Keycloak dev auth instructions (JWT and SA token examples) - E2E README: update quick start to use extract-token.sh (Keycloak client_credentials with K8s SA fallback) - Kind dev guide: add Keycloak to bootstrap steps, document dev credentials and session lifetimes - CONTRIBUTING.md: add Keycloak to kind-up description, update access instructions with login info - OPENSHIFT_OAUTH.md: mark as legacy, link to SSO spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…facing The sso-authentication Unleash flag controls which auth path the backend uses. It is not visible in workspace settings and is not user-configurable — ops enables it per-environment during migration. Kind dev cluster creates and enables it automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bient-code/platform into jsell/spec/sso-authentication
Use client_id instead of id_token_hint for the end-session URL when the id_token is not available in the session (dropped to stay under the 4KB cookie size limit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…flag OverrideConfig previously hardcoded the JWKS URL to sso.redhat.com, ignoring both --jwk-cert-url flag and environment variables. This makes it impossible to use a different OIDC provider (e.g. Keycloak) without a code change. Priority is now: JWK_CERT_URL env var > --jwk-cert-url flag > default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds complete workflow for deploying Ambient with native Keycloak SSO, bypassing the legacy oauth-proxy and agentic-operator. Includes overlay for jsell-sso-poc namespace on hcmais cluster with all patches for api-server, control plane, and RBAC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SSO_ENABLED env var so SSO works without Unleash feature flag - Prefer preferred_username over email for k8s impersonation (matches Keycloak identity brokering claim order) - Default to system:authenticated group when JWT has no groups - Add debug logging for JWT validation failures - Fix roleBindings migration ID to avoid conflict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Organize overlays by cluster. hcmais/ now contains both the Keycloak deployment and the jsell-sso-poc ambient deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bases The ID was changed for a PoC database conflict. Reverting to the original ID so production databases that already ran this migration aren't affected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ing databases" This reverts commit 56532bf.
The integration test package had a TestMain (server bootstrap) but zero test functions. Go test treats this as a failure in verbose mode. Added a minimal test that validates the server starts successfully. Pre-existing issue since 09b88a4, not caused by this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bient-code/platform into jsell/spec/sso-authentication
Review — feat(security): SSO/JWT authentication migration (Phase 1)Solid implementation overall. The BFF pattern (Next.js as confidential OIDC client, browser never sees a raw JWT) is the right call, and the dual-path auth (JWT → TokenReview fallback) makes incremental rollout safe. A few things worth a second look before merge: 🔴 Migration ID — verify final stateThe commit history shows three conflicting commits on the
The PR description says "Idempotent — safe for existing databases where the old ID already ran" but does not explain the revert-of-revert. If existing prod databases already have 🟡
|
…I spec
The OpenAPI spec defines credentials under /projects/{id}/credentials/
but the route registration only had /credentials/. The generated SDK
client calls the project-scoped path, causing all credential
integration tests to 404.
Registers the same handlers under both paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI Failure —
|
…obber Kustomize's namespace transformer rewrites all subject namespaces in ClusterRoleBindings, which breaks the ambient-code namespace bindings when deploying to a different namespace. Move CRBs to a separate file applied independently of kustomize. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review responseThanks for the thorough review @mturansk. All points addressed in 🔴 Migration IDVerified against production DB. The 🟡 e2e-login hardeningAdded 🟡 preferred_username vs email consistencyAligned 🟡 JWK cert URL priorityFixed to conventional ordering: CLI flag > env var > default. CI failuresBoth were pre-existing:
|
- e2e-login: add AMBIENT_ENV=production guard alongside E2E_TEST_HELPERS to prevent accidental auth bypass in production - impersonation: align getUserSubjectFromContext claim order with buildImpersonatingClients (preferred_username > email > sub) so RoleBinding subjects match the impersonated identity - JWK_CERT_URL: fix priority to CLI flag > env var > default (standard convention, per review feedback) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e89e44a to
084e113
Compare
markturansky
left a comment
There was a problem hiding this comment.
Amber Code Review — feat(security): SSO/JWT authentication migration (Phase 1)
Summary
Substantial, well-structured SSO migration. The core auth path — JWKS-based JWT validation, K8s impersonation, dual-path (JWT + TokenReview) fallback, and encrypted httpOnly session cookie — is architecturally sound and the approach is correct. One critical security flaw in the OIDC callback (open redirect) must be fixed before merge, along with a few major issues around the Keycloak proxy surface and committed credential placeholders.
Findings
Critical
Open redirect in OIDC callback — components/frontend/src/app/api/auth/sso/callback/route.ts
const returnTo = cookieStore.get("oidc_return_to")?.value || "/";
// ...
return NextResponse.redirect(new URL(returnTo, origin));new URL(absoluteUrl, base) ignores base when the first argument is already absolute. An attacker can craft /api/auth/sso/login?returnTo=https://evil.com; the server stores it in the httpOnly oidc_return_to cookie; after successful OIDC login the user is silently redirected to evil.com. Classic phishing vector.
Fix: validate returnTo is a relative path before storing/using it:
function safeReturnTo(value: string | undefined): string {
if (!value) return "/";
try {
const url = new URL(value, "http://localhost");
// Reject if the input was already absolute (has its own host)
if (new URL(value, "http://localhost").origin !== "http://localhost") return "/";
return url.pathname + url.search;
} catch { return "/"; }
}Apply in both login/route.ts (before storing to cookie) and callback/route.ts (before using from cookie).
Major
1. CHANGE_ME placeholders committed to cluster overlay — components/manifests/overlays/hcmais/jsell-sso-poc/
sso-credentials.yaml has SSO_CLIENT_SECRET: CHANGE_ME, SESSION_SECRET: CHANGE_ME. keycloak-admin-secret.yaml has password: CHANGE_ME. keycloak-db.yaml has POSTGRES_PASSWORD: CHANGE_ME. keycloak-realm.json has "clientSecret": "CHANGE_ME" for both the frontend client and the RH SSO identity provider.
These are in a cluster-specific overlay intended for production (hcmais). If applied without rotation the session cookie encryption key is trivially known. Should either:
- Add a
# REQUIRED: rotate before applyingcomment block and a pre-flight check in the deployment workflow - Or use ExternalSecret / SealedSecret so real values are never committed
The Kind overlay correctly uses dev-secret-do-not-use-in-prod labels and is isolated to dev. The hcmais overlay needs the same level of care.
2. Unrestricted Keycloak proxy — components/frontend/src/app/sso/[...path]/route.ts
This route proxies all GET and POST requests to SSO_ORIGIN (Keycloak) with no path allowlisting. It exposes every Keycloak HTTP endpoint to the public internet via the frontend, including the admin REST API (/auth/admin/...), token introspection, user endpoints, etc. Keycloak has its own auth on admin routes, but this needlessly widens the attack surface and makes ingress WAF rules harder.
Suggested: restrict to paths the OIDC flow actually needs:
const ALLOWED_PATHS = /^(\/realms\/[^/]+\/(protocol\/openid-connect|\.well-known))/;
if (!ALLOWED_PATHS.test(`/${path}`)) {
return NextResponse.json({ error: "Not found" }, { status: 404 });
}3. E2E auth bypass reachable with SSO enabled on Kind — components/manifests/overlays/kind/frontend-test-patch.yaml
E2E_TEST_HELPERS=true is set unconditionally in the Kind test patch. The e2e-login route's second guard (AMBIENT_ENV === "production") is only meaningful if production overlays explicitly set AMBIENT_ENV=production on the frontend deployment. The Kind overlay doesn't set it, so after make kind-sso-toggle (SSO on), the auth bypass endpoint is live at /api/auth/sso/e2e-login for anyone who can reach the Kind cluster and knows the route exists.
Dev-only cluster, so production impact is nil. But a developer running Kind SSO mode with external access (e.g., port-forwarded to a shared network) is exposed. Consider removing E2E_TEST_HELPERS from the standard Kind patch and requiring it to be set via a separate make kind-e2e-mode toggle, or add AMBIENT_ENV=development to the route guard so the check is explicit both ways.
Minor
4. Module-level OIDC config cache — components/frontend/src/lib/oidc.ts:5-9
cachedConfig and cachedAt are module-level variables. In serverless / Next.js edge deployments, module state is not guaranteed to persist across invocations, which could cause repeated OIDC discovery on every request. Consider using lru-cache or an explicit cache tied to the Next.js request lifecycle, or document that this is intentional for long-lived Node.js processes only.
5. Silent session destroy on missing refresh token — components/frontend/src/lib/session.ts:60-63
When the access token is expired and there's no refresh token, session.destroy() is called silently. No log entry makes it hard to debug "why did my session disappear?" Add: console.warn("SSO: session expired with no refresh token, destroying").
6. Kind Keycloak SecurityContext — components/manifests/overlays/kind/keycloak-deployment.yaml:16
runAsNonRoot: false for Kind Keycloak and Kind api-server (kind/api-server-security-patch.yaml). CLAUDE.md mandates runAsNonRoot on all containers. Dev-only, so no prod risk, but worth noting the deliberate divergence with an inline comment explaining why Keycloak requires it (# Keycloak upstream image requires root; acceptable in Kind dev only).
7. Empty smoke test — components/ambient-api-server/test/integration/integration_test.go:258-261
TestServerStarts has no assertions — the comment explains the intent, but this pattern is fragile: a future refactor that skips TestMain would leave a silently-passing test. Consider require.NotNil(t, helper.AppConfig()) or equivalent to make the assertion explicit.
Positive Highlights
- JWT validator is well-built: JWKS cache with 5-min min-refresh, proper issuer/audience/expiration validation, manual issuer check to support alt-issuers (dev vs. public URL). The
validator_test.gois thorough — expired, tampered signature, wrong issuer/audience, malformed, empty, minimal claims — all covered. - K8s impersonation approach is the right call: no cluster OIDC federation required, all existing RBAC bindings continue to work, SA tokens (API keys) accepted via TokenReview fallback. Identity claim resolution order (
preferred_username > email > sub) is consistent acrossbuildImpersonatingClients,setIdentityFromClaims, andgetUserSubjectFromContext. - SSAR cache key improved from raw bearer token to OIDC sub — reduces cache churn on token rotation and is more correct semantically.
InitJWTValidatornon-fatal design: server starts normally without SSO configured; the feature flag is also off in that state. Clean.- Session cookie correctly configured: httpOnly, SameSite=lax, encrypted via iron-session.
- PKCE+state implemented correctly for Authorization Code Flow.
- Migration ID change well-reasoned and idempotency analysis is rigorous.
forwardedIdentityMiddlewarefalls through to legacy header-based path on JWT failure, preserving backward compatibility.
Recommendations (priority order)
- Fix open redirect in
callback/route.tsandlogin/route.ts— validatereturnTois relative before storing/using. (~10 lines) - Add path allowlist to
sso/[...path]/route.ts— block everything except/realms/*/protocol/openid-connect/*and/.well-known/*. - Address CHANGE_ME placeholders in
hcmais/jsell-sso-poc/— add deployment workflow gate or switch to ExternalSecret/SealedSecret. - Document / isolate E2E_TEST_HELPERS — avoid having it enabled by default on the Kind overlay when SSO is toggled on.
- Items 4–7 are quality improvements that can be addressed in follow-up tickets.
Review by Amber · standards refs: specs/standards/security/security.spec.md, skills/amber-review/SKILL.md, CLAUDE.md
Critical: - Fix open redirect in OIDC callback — validate returnTo is a relative path before storing in cookie and before redirecting. Absolute URLs (e.g. https://evil.com) are rejected and replaced with "/". Major: - Add path allowlist to Keycloak proxy — only OIDC-required paths (realms/*/protocol/openid-connect, .well-known, login-actions, resources) are proxied. Admin API and other endpoints return 404. - Add warning comments to CHANGE_ME placeholders in sso-credentials.yaml pointing to the deployment workflow. Minor: - Add console.warn when session is destroyed due to missing refresh token - Add real assertion to TestServerStarts (verify helper initialized and server URL is non-empty) - Add inline comment explaining runAsNonRoot:false on Kind Keycloak Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Response to Amber code review (#4339633631)All findings addressed in Critical — Open redirectFixed. Added Major 1 — CHANGE_ME placeholdersAdded warning comment block pointing to Major 2 — Keycloak proxy allowlistAdded Major 3 — E2E auth bypassAlready addressed in prior commit — added Minor items
|
markturansky
left a comment
There was a problem hiding this comment.
Re-review: SSO/JWT Authentication Migration (Phase 1)
Holistic re-review after fixup commit
1d194b00. Previous review posted 2026-05-21.
Fixes verified ✅
All six items from the initial review were addressed in the fixup commit. Each one is correctly implemented:
Open redirect in callback/login routes — safeReturnTo() now resolves all inputs against http://localhost and rejects anything whose origin differs from that base. Tested mentally against the classic attack vectors: https://evil.com → origin mismatch → /; //evil.com/path → resolves to http://evil.com/path → origin mismatch → /; javascript:alert(1) → URL parse throws → catch → /; /projects/foo → same origin → /projects/foo. Fix is correct.
Keycloak proxy path allowlist — regex /^realms\/[^/]+\/(protocol\/openid-connect|\.well-known|login-actions|resources)\b/ blocks everything except the four legitimate Keycloak endpoint families. Correct.
E2E auth bypass double guard — now requires both E2E_TEST_HELPERS === "true" and AMBIENT_ENV !== "production" before the bypass activates. Returns 404 otherwise. Correct.
CHANGE_ME comment on sso-credentials.yaml — header comment added, clearly marking placeholder values and linking to the deploy workflow. Correct.
runAsNonRoot in Kind overlay — inline comment explains why root is acceptable for the Keycloak upstream image in a dev-only context. Correct.
Integration test guard — TestServerStarts now fails fast with a clear message if TestMain didn't initialize the helper. Correct.
Remaining gaps ⚠️
1. CHANGE_ME comment missing from three sibling overlay files (minor → should fix)
sso-credentials.yaml got the # REQUIRED: Replace CHANGE_ME values before deploying header, but the other three files in the same overlay directory did not:
components/manifests/overlays/hcmais/jsell-sso-poc/keycloak-admin-secret.yamlcomponents/manifests/overlays/hcmais/jsell-sso-poc/keycloak-db.yamlcomponents/manifests/overlays/hcmais/jsell-sso-poc/keycloak-realm.json
All three contain literal placeholder strings that would break a real deployment if applied as-is. Same fix needed: add the required-replace comment at the top of each file.
2. E2E_TEST_HELPERS=true unconditional in Kind patch (minor → should fix)
components/manifests/overlays/kind/frontend-test-patch.yaml still sets E2E_TEST_HELPERS=true unconditionally. The E2E route is now double-guarded in code, but anyone running the Kind overlay gets an always-on test-helper surface. This should either be removed from the base Kind patch and injected only in CI test runs, or at minimum carry a comment explaining the intent. As it stands, any developer who applies the Kind overlay locally activates the bypass endpoint.
New bug: session.idToken is never stored 🐛 (must fix before merge)
logout/route.ts reads session.idToken to build the OIDC end_session_endpoint URL:
// logout/route.ts
const endSessionUrl = await getEndSessionUrl(
session.idToken, // ← always undefined
redirectUrl,
);getEndSessionUrl uses idToken as the id_token_hint parameter, which tells the identity provider which session to terminate. If it is undefined, the function falls back to passing only client_id, which means Keycloak may not terminate the upstream SSO session — only the local cookie is destroyed. Users will find themselves re-logged-in silently the next time they visit, which defeats single sign-out.
The root cause is in callback/route.ts. exchangeCode() returns idToken, but the callback only persists three of the four fields:
// callback/route.ts (current)
const tokens = await exchangeCode(code, codeVerifier, redirectUri);
session.accessToken = tokens.accessToken;
session.refreshToken = tokens.refreshToken;
// session.idToken is dropped here ← bug
session.expiresAt = tokens.expiresAt;
await session.save();Fix is one line:
session.idToken = tokens.idToken;This must be added before session.save() in the callback route.
Style nit (optional)
safeReturnTo() is copy-pasted identically into both login/route.ts and callback/route.ts. It should live in a shared module (e.g. src/lib/auth-utils.ts) and be imported from both routes. Not a blocker, but worth cleaning up before this ships to production to avoid the two copies drifting.
Verdict
The fixup commit is a solid response to the initial review — all six critical and major issues are correctly addressed. Two minor gaps remain (CHANGE_ME comments on sibling files, unconditional E2E_TEST_HELPERS in Kind patch), and there is one new must-fix bug: session.idToken is never persisted to the iron-session cookie, which silently breaks OIDC single sign-out.
Request changes on the idToken omission. The other items can be addressed in the same commit. Happy to re-review once those three lines are in.
…E_ME Must-fix: - Persist idToken to session cookie in OIDC callback so logout can pass id_token_hint to Keycloak end_session_endpoint (single sign-out) Cleanup: - Extract safeReturnTo() to shared lib/auth-utils.ts, import from both login and callback routes - Add CHANGE_ME warnings to keycloak-admin-secret.yaml, keycloak-db.yaml, and keycloak kustomization.yaml (keycloak-realm.json is JSON, noted in kustomization instead) - Add security comment to E2E_TEST_HELPERS in Kind overlay Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Response to re-reviewAll items addressed in idToken bug (must-fix): Added CHANGE_ME on sibling files: Added warnings to safeReturnTo dedup: Extracted to E2E_TEST_HELPERS comment: Added security note to the Kind overlay explaining the double-guard and advising against shared-network exposure. |
…bber The sso-credentials.yaml contains CHANGE_ME placeholders. Including it in kustomize resources causes oc apply -k to overwrite manually-set secret values. Same pattern as the CRB namespace clobber fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
markturansky
left a comment
There was a problem hiding this comment.
Third Pass — Thorough Line-by-Line Review
Reviewed after commit
05bb922e. All three previous fixup commits now absorbed.
All prior issues confirmed resolved ✅
Every finding from reviews 1 and 2 is correctly addressed. I walked each file end-to-end below; I won't rehash what's already fixed.
New findings
MAJOR — hcmais Keycloak realm missing groups scope
File: components/manifests/overlays/hcmais/keycloak/keycloak-realm.json
The Kind realm correctly defines a realm-level groups clientScope with an oidc-group-membership-mapper, and all three clients include it in defaultClientScopes. The hcmais realm does neither:
Kind realm: ambient-frontend defaultClientScopes: ['openid', 'email', 'profile', 'groups'] ✅
hcmais realm: ambient-frontend defaultClientScopes: ['openid', 'email', 'profile'] ✗
There is no groups clientScope definition at the realm level and no per-client protocol mapper for group membership. Tokens issued by the hcmais Keycloak will have no groups claim.
In buildImpersonatingClients (handlers/sso.go):
groups := claims.Groups
if len(groups) == 0 {
groups = []string{"system:authenticated"}
}Every SSO user on hcmais will be impersonated as system:authenticated and nothing else. Any K8s RoleBinding bound to a specific group (e.g., an LDAP-synced group like ambient-users) will not be honoured. Users may see unexpected 403s on resources gated by group membership.
Fix: Add the groups clientScope to the hcmais realm, mirroring the Kind config:
"clientScopes": [
{
"name": "groups",
"protocol": "openid-connect",
"protocolMappers": [{
"name": "groups",
"protocolMapper": "oidc-group-membership-mapper",
"config": {
"full.path": "false",
"access.token.claim": "true",
"id.token.claim": "true",
"claim.name": "groups"
}
}]
}
]And add "groups" to ambient-frontend.defaultClientScopes.
MAJOR — Identity claim precedence is inconsistent across the stack
There are three places that resolve "who is this user"; all three use a different priority order:
| Site | Priority order |
|---|---|
buildImpersonatingClients (sso.go) |
preferred_username → email → sub |
setIdentityFromClaims — userID (server.go) |
email → sub |
setIdentityFromClaims — userName (server.go) |
preferred_username → email |
K8s impersonation uses preferred_username as the primary. The context userID uses email as the primary. For any user that has both claims (the common case), the K8s identity is their preferred_username (e.g. jsell) but the application userID is their email (e.g. jsell@redhat.com). These two values are used in different places — userID for secret key naming, credential lookup; K8s identity for RBAC. If any RBAC RoleBinding is authored with the email address as the subject, it would match K8s impersonation only when preferred_username is absent. If it is authored with the username, userID-keyed lookups won't match.
The root fix is to make setIdentityFromClaims use preferred_username first for userID, matching the impersonation order:
// Consistent with buildImpersonatingClients
primaryID := claims.PreferredUsername
if primaryID == "" {
primaryID = claims.Email
}
if primaryID == "" {
primaryID = claims.Sub
}
c.Set("userID", SanitizeUserID(primaryID))
c.Set("userIDOriginal", primaryID)MINOR — SSO middleware falls through to X-Forwarded-* headers after JWT failure
File: components/backend/server/server.go, forwardedIdentityMiddleware
When SSO is enabled and JWT validation fails (e.g., the request carries an API key that is not a Keycloak JWT), the middleware logs the failure and falls through to the legacy OAuth proxy path:
// JWT validation failed — fall through to header-based extraction
// (API keys will be handled by getK8sClientsDefault via TokenReview)
// [falls through to reading X-Forwarded-User, X-Forwarded-Email, etc.]The comment is correct — API key identity is re-established by setIdentityFromTokenReview inside getK8sClientsDefault, which overwrites whatever the middleware set. So in practice no handler sees wrong identity. But the middleware shouldn't trust X-Forwarded-* headers at all when SSO is the active auth mode — those headers belonged to the OpenShift OAuth proxy era and should be treated as untrusted in the SSO path.
The minimal fix is a guarded return:
if SSOEnabled() {
// JWT failed AND we're in SSO mode — don't fall through to OAuth proxy headers.
// Identity will be established by tokenReviewIdentity in getK8sClientsDefault.
c.Next()
return
}This is defense-in-depth: it closes the theoretical path where a direct-to-backend caller sets crafted X-Forwarded-* headers alongside an API key token.
MINOR — OIDC authorization request only asks for openid scope
File: components/frontend/src/lib/oidc.ts, buildAuthorizationUrl
scope: "openid",The code later depends on email and preferred_username claims. These are delivered by Keycloak because email and profile are in the client's defaultClientScopes, but the request doesn't ask for them explicitly. If the Keycloak admin ever changes the default scopes, the claims silently disappear and auth degrades in a hard-to-diagnose way. Should be:
scope: "openid email profile",MINOR — getAccessToken() doesn't persist refreshed idToken
File: components/frontend/src/lib/session.ts
refreshOIDCTokens returns a new idToken (Keycloak always issues a new one on refresh). The refresh path saves accessToken, refreshToken, and expiresAt but drops idToken:
session.accessToken = tokens.accessToken;
session.refreshToken = tokens.refreshToken;
// session.idToken not updated here
session.expiresAt = tokens.expiresAt;The old idToken from initial login continues to work as id_token_hint at logout (Keycloak accepts it), so this isn't broken today. But Keycloak rotates session keys on refresh, and some configurations invalidate old ID tokens. Add session.idToken = tokens.idToken to the refresh block for correctness.
MINOR — ambient-cli client lacks an audience mapper
File: components/manifests/overlays/hcmais/keycloak/keycloak-realm.json (and Kind equivalent)
The ambient-frontend client has a protocol mapper that adds ambient-frontend to the aud claim of access tokens. The ambient-cli client has no such mapper.
The backend validates JWT audience against SSO_AUDIENCE (which is ambient-frontend). A user who authenticates via the CLI client will receive a token without aud: ambient-frontend, and the backend will reject it with a validation error. If the CLI is intended to reach the backend in Phase 1 (or Phase 2), this needs an audience mapper on the CLI client. If the CLI is out of scope for now, add a comment in the realm JSON noting this.
MINOR — new URL(SSO_ISSUER_URL) at module init in Keycloak proxy
File: components/frontend/src/app/sso/[...path]/route.ts
const SSO_ORIGIN = process.env.SSO_ISSUER_URL
? new URL(process.env.SSO_ISSUER_URL).origin
: null;This executes at module load time. If SSO_ISSUER_URL is set to a malformed value (missing protocol, trailing space, etc.), new URL() throws a TypeError that propagates as an unhandled module-load error, crashing the Next.js worker. Wrap it:
function parseSSOOrigin(): string | null {
try {
return process.env.SSO_ISSUER_URL
? new URL(process.env.SSO_ISSUER_URL).origin
: null;
} catch {
console.error('SSO_ISSUER_URL is not a valid URL:', process.env.SSO_ISSUER_URL);
return null;
}
}
const SSO_ORIGIN = parseSSOOrigin();Nits
Keycloak proxy Location rewrite uses String.replace (first match only):
value.replace(SSO_ORIGIN, request.nextUrl.origin + "/sso")String.replace with a string argument replaces only the first occurrence. Use replaceAll or a global regex. Unlikely to matter in practice (Location headers don't repeat the origin), but correct API is replaceAll.
hcmais realm client secret in keycloak-realm.json is CHANGE_ME:
If the realm JSON is imported into Keycloak as-is (which the --import-realm flag does on startup), the ambient-frontend Keycloak client secret will be set to CHANGE_ME, separate from the SSO_CLIENT_SECRET in the K8s secret. Both must be updated together. The kustomization now has a comment about this; consider adding a line specifically calling out the realm JSON client secret requirement: "secret": "<must match SSO_CLIENT_SECRET>".
Overall assessment
The core implementation is solid. The JWT validator, JWKS cache, K8s impersonation, BFF OIDC session model, and E2E bypass guards are all well-designed. The two major findings (groups scope gap and identity precedence inconsistency) should be addressed before this goes to production — they don't block the PoC deployment but will cause real RBAC failures when users with group-based access controls hit the system.
The five minor items are fixes, not blockers for the PoC. Requesting changes on the groups scope and identity precedence; everything else is advisory.
iron-session's cookie limit (4096 bytes) is exceeded when idToken is included alongside accessToken and refreshToken — especially with Keycloak identity brokering which produces large tokens. Store idToken in a dedicated httpOnly oidc_id_token cookie. Logout reads it from there for id_token_hint. Session type no longer includes idToken. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redirect post-logout to /logged-out instead of app root (which triggers auto-redirect back to SSO, re-authenticating silently) - Add /logged-out page excluded from SSO middleware - Include ssoEnabled in /api/me response even when unauthenticated so the logout button always uses the correct path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s scope, proxy hardening Align setIdentityFromClaims with buildImpersonatingClients claim order (preferred_username > email > sub) so userID and K8s impersonation identity are consistent. Guard X-Forwarded-* header fallthrough when SSO is enabled. Add groups clientScope and audience mapper to both Kind and hcmais Keycloak realms. Request explicit OIDC scopes, persist refreshed idToken, and harden the Keycloak proxy init. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
SSO authentication migration — replaces OpenShift OAuth proxy with direct OIDC authentication via Keycloak. Includes a validated end-to-end native SSO deployment path (api-server + control plane) that bypasses the legacy backend and operator entirely.
What's implemented
Frontend BFF OIDC: Next.js acts as confidential OIDC client. Browser gets httpOnly session cookie, never sees a raw JWT. Authorization Code Flow with PKCE. Transparent token refresh (5-min access tokens, 30-min sessions).
Backend JWT validation: JWKS-based validation via
lestrrat-go/jwx/v2. Validates signature, expiration, issuer, and audience.K8s impersonation: Backend SA +
Impersonate-User/Impersonate-Groupheaders preserve all existing RBAC enforcement without cluster OIDC federation. Preferspreferred_usernameoveremailfor impersonation (matches Keycloak identity brokering claim order). Defaults tosystem:authenticatedgroup when JWT has no groups.Dual-path auth: JWT validation first, K8s TokenReview fallback for API keys (SA tokens). Both paths use impersonation.
SSO_ENABLED env var: SSO can be enabled via env var without depending on Unleash feature flag.
API server JWK_CERT_URL: Production environment no longer hardcodes the JWKS URL. Configurable via
JWK_CERT_URLenv var or--jwk-cert-urlCLI flag, enabling non-RH-SSO OIDC providers (e.g., Keycloak).Session expired UX: Global 401 detection via React Query cache, blocking dialog with login redirect, no retry storms.
E2E test auth:
client_credentialsgrant from Keycloak with K8s SA fallback.Local Keycloak: Kind cluster includes Keycloak with pre-configured realm, dev users, and protocol mappers.
make kind-sso-toggleswitches between SSO and legacy mode.Native SSO deployment workflow: Complete workflow for deploying with Keycloak SSO, validated on hcmais cluster. See
workflows/deploy-native-sso.md.Migration ordering fix: roleBindings
typedFKMigrationID changed from202505130001to202603100139to sort after the table-creation migration it depends on (202603100138creates therole_bindingstable; the old ID sorted before it, breaking fresh databases).Production DB impact verified: the
ambient-codestaging RDS already has202505130001in its migrations table. Shipping202603100139means gormigrate treats it as a new migration and runs it. Every statement usesIF EXISTS/IF NOT EXISTS/DROP NOT NULL(all idempotent), so all operations no-op on production. The migrations table will contain both IDs — redundant but harmless. Fresh databases (Kind, new namespaces) will only have202603100139in the correct sort position.Deployment overlays
Overlays reorganized by cluster under
components/manifests/overlays/hcmais/:hcmais/jsell-sso-poc/hcmais/keycloak/Key files
specs/security/sso-authentication.spec.mdworkflows/deploy-native-sso.mdcomponents/ambient-api-server/cmd/ambient-api-server/environments/e_production.gocomponents/backend/handlers/sso.go,server/server.gocomponents/frontend/src/lib/oidc.ts,session.ts,auth.tscomponents/manifests/overlays/hcmais/jsell-sso-poc/components/manifests/overlays/hcmais/keycloak/base/rbac/backend-clusterrole.yaml,base/rbac/control-plane-*Default behavior
make kind-updeploys with legacy auth (SA token, no Keycloak redirect)make kind-sso-toggleenables Keycloak OIDC for both frontend and backendSSO_ENABLED=trueor thesso-authenticationUnleash flagsso.redhat.comJWKS ifJWK_CERT_URLis not setNative SSO deployment (validated)
The end-to-end path without legacy components was validated on the hcmais cluster:
Prerequisites for a new environment:
ambient-frontend,ambient-control-plane(service account),ambient-cli(public)sso-credentials,control-plane-oidc,ambient-vertexworkflows/deploy-native-sso.mdfor the complete checklistTest plan
make kind-sso-toggleswitches between SSO and legacy modeclient_credentials🤖 Generated with Claude Code